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ABSTRACT 


We introduce and investigate a distributed computation mod- 
el for querying the Web. Web queries are computed by in- 

teracting automata running at different nodes in the Web. 

The automata which we are concerned with can be viewed 

as register automata equipped with an additional commu- 

nication component. We identify conditions necessary and 

sufficient for systems of automata to compute Web queries, 

and investigate the computational power of such systems. 


1. INTRODUCTION 


Much attention has recently been paid to querying the Web 
[5]. A salient feature of queries and computations on the 
Web is their browsing nature: unlike a conventional database, 
the Web is usually explored navigationally, starting from a 
particular node in the Web (e.g., the user’s homepage). This 
has led Abiteboul and Vianu [2] to formally define a Web 
query as a mapping from pairs (Z,s) to sets of nodes in ZT, 
where Z is a Web instance and s is the source of the query, 
i.e., the node from which we start exploring the Web. Var- 
ious kinds of machines specially tailored for computations 
on the Web have been introduced and studied by Abiteboul 
and Vianu in this context, in particular browser machines, 
which can be viewed as Turing machines navigating the Web 
by following links. 


Another recent development is that of Internet supercom- 
puting [6, 7], where many individual computers linked to 
the Internet collaborate in a distributed computation. An 
appealing and popular example is the SETI@home project, 
which scans radio signals from space for signs of extrater- 
restrial intelligence [12]. 


In this paper, we combine these two lines of research. More 
precisely, we investigate the possibilities and limitations of 
Web automata, a computation model for querying the Web, 
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which is—like the browser machine model—purely naviga- 
tional, but which is also distributed. Starting at a given 
source node, a finite portion of the Web, reachable from 
the source by following links, is populated with lightweight 
processes. Typically, this portion is determined by specify- 
ing a maximum number of links that can be followed (as 
commonly done in tools for off-line browsing and Web mir- 
roring). The processes run concurrently and follow a pro- 
gram specified essentially as a finite register automaton [11]. 
They report back to the source process by sending messages 
upwards along the edges of a spanning tree, a standard net- 
work topology used in computer networks and distributed 
computation [3, 18]. 


Our investigation offers the following contributions: 


(i) We define a fair, efficient, and easy to enforce communi- 
cation protocol by which distributed computations proceed 
in rounds. Each round of a distributed computation has a 
layered structure according to the levels of a spanning tree. 
The Web automata (or, processes) at each level of the span- 
ning tree run concurrently, and each level takes only con- 
stant parallel time. After each round, the source automa- 
ton, i.e., the Web automaton running at the source node, 
is guaranteed to have received enough data in order to de- 
cide whether to continue for another round, or to terminate 
the computation. In addition, the source automaton may 
produce output. We identify a decidable property of Web 
automata, called productivity, which enables this protocol. 
Testing productivity is PSPACE-complete. 


(ii) Since the order of upward communications within a layer 
is not fixed, there may exist many different distributed runs 
for a given spanning tree. On top of that, the spanning tree 
itself arises out of the computation and is thus not a pri- 
ori fixed. We call a Web automaton sound if it produces 
the same output for every possible distributed run on every 
possible spanning tree. Every sound Web automaton com- 
putes a well-defined Web query. We show that soundness is 
undecidable. (This is not entirely evident, given the finite 
nature of Web automata and the rather rigid communication 
protocol which they must follow.) 


(iii) Although a sound Web automaton computes a Web 
query, one may have to wait many rounds before seeing 
any new output. This can be quite undesirable in prac- 
tice. We call a Web automaton continuous if it produces 


new output in every round. Note the analogy with Abite- 
boul and Vianu’s distinction between finitely and eventually 
computable Web queries: in the case of a query which is only 
eventually computable, one can never be sure that the entire 
output has actually been output. Note also that a continu- 
ous Web automaton which computes a boolean query (i.e., 
a query with yes/no answers) already knows the correct an- 
swer after the first round. We show that continuity is also 
undecidable. 


(iv) Assuming that at every node there is an ordering of the 
outgoing links available (a very natural assumption in the 
context of the Web), we can show that every logarithmic- 
space computable Web query is computable by a Web au- 
tomaton. 


(v) We furthermore introduce decide-and-forward automata, 
which constitute a natural, syntactic subclass of Web au- 
tomata. A decide-and-forward (DF) automaton makes all 
the crucial decisions already after the first round; the subse- 
quent rounds are pure forwarding rounds which merely flush 
the remaining contents of the communication queues to the 
output. We show that soundness and continuity of DF au- 
tomata becomes decidable in the monadic case, i.e., when 
the tests performed by automata are based on unary predi- 
cates of Web nodes only. We also give a characterization of 
the Web queries continuously computable by monadic DF 
automata in terms of a fragment of first-order logic. 


(vi) Finally, we study browser stack machines, a restricted 
variant of Abiteboul and Vianu’s browser machines. Our 
restricted browser machines have only a finite work memory 
and use their Turing tape in a stack-like manner, similar 
to the three familiar surf actions of common Web browsers: 
‘follow this link’, ‘go back’, and ‘go forward’. We show that, 
if the depth of the stack is bounded (so that the machine 
cannot get ‘lost in hyperspace’), then every Web query com- 
putable by a browser stack machine is computable by a Web 
automaton. 


Related Work. Distributed Web querying systems (simi- 
lar to our theoretical model) have already been implemented, 
e.g., the DIASPORA system [8, 15]. As for theoretical work, 
we are aware of only few publications on navigational Web 
querying, most notably the original papers by Abiteboul 
and Vianu [2], and Mendelzon and Milo [13]. Both papers 
focus on computational completeness, while we work with 
a limited computation model and focus on distribution and 
efficiency. Abiteboul and Vianu also proposed a distributed 
evaluation algorithm for regular path queries [1]. A very 
recent proposal of a formal model for Web querying using 
concurrent agents was made by Sazonov [16]. His model is 
based not on finite automata but on a set-theoretic term 
language. Of course, finite automata working over abstract 
domains (in our case, Web nodes) rather than over finite 
alphabets have been considered before, e.g., in the study of 
regular languages over infinite alphabets [11, 14]. Finally, 
we mention that our definition of distributed runs of Web 
automata is inspired by Gurevich’s definition of partially 
ordered runs of distributed abstract state machines [9]. 


Outline. In the next section, we recall the definition of Web 
queries, slightly adapted for our purposes. In Sections 3 and 


4, we introduce our automaton model and define distributed 
runs of systems of automata. In Sections 5 and 6, we con- 
sider automata suitable for computing Web queries and in- 
troduce DF automata. We conclude in Section 7 by draw- 
ing a connection between Abiteboul and Vianu’s browser 
machines and our computation model. 


2. WEB QUERIES 


We consider Web queries in the spirit of Abiteboul and 
Vianu [2]. Since in this paper we are mainly concerned with 
computations on finite portions of the Web, we focus on Web 
queries on finite instances of the Web. 


Web Instances. In the following, Y denotes a finite, re- 
lational vocabulary containing (at least) the binary relation 
symbol Link. A Web instance T a finite structure over Y. We 
view the elements of Z as abstractions of Web pages, Web 
sites, or other objects on the Web, and call them nodes. The 
ordered pairs in the binary relation Link’ represent links in 
the Web. The other relations of Z are abstractions of seman- 
tic predicates which a Web query may apply to nodes. For 
example, a unary relation Ri(x) could stand for “Web page 
x contains the keyword Madison”, a binary relation R2(zx, y) 
could stand for “the link from x to y is labeled Madison”, 
and a ternary relation R3(x,y, z) could stand for “on page 
x, all links to y precede all links to z”. The vocabulary will 
thus vary from query to query. 


Although a Web instance is nothing but a standard rela- 
tional database with at least one binary relation, there is 
a crucial difference between querying a relational database 
and querying the Web: to answer a Web query, one can 
examine the ‘database’ only by following links, starting at 
some source node. This leads us to the next basic definition. 


Web Queries. A Web query Q over YT is a mapping that 
assigns to every pair (Z,s), where Z is a Web instance over 
Y and s is a node in T, a set of nodes in Z. The node s is 
also called the source (of the query). Following the standard 
genericity criterion for database queries, we require that Q 
preserves isomorphisms: if (Z’,s’) is isomorphic to (Z, s) 
via an isomorphism +, then Q(Z’, s’) = (Q(T, s)). Further- 
more, since we will consider purely navigational computa- 
tion models only, it is only fair to accordingly require that 
Q(T, s) = Q(Reach(Z, s), s), where Reach(Z, s) denotes the 
substructure of Z generated by the nodes reachable from the 
source s by following links. 


Notice that every first-order formula y(s,x) over Y with 
free(y) = {s, x} defines a Web query Qy over Y: 
Qo(T, s) := {n : Reach(Z, s) H gfs, n]}. 


3. WEB AUTOMATA 


We begin the introduction of our computation model by 
defining 


e Web automata, a variant of register automata equip- 
ped with an additional communication component, and 


e runs of Web automata at individual Web nodes, which 
we refer to as local runs. 


Distributed runs of systems of Web automata are subject of 
the next section. 


Web Automata. Our automata are specified by simple, 
rule-based programs defined as follows. Expand YT to a vo- 
cabulary Y* by adding three constant symbols 0, 1, and 
L, and a unary relation symbol Source. Intuitively, 0 and 
1 represent the boolean values false and true, respectively, 
L denotes the empty queue (of an automaton), and Source 
indicates the source node. Fix some tuple 7 = (ri,...,7¢) 
of variables (representing the registers of an automaton). 


A guard (of a rule) is a quantifier-free first-order formula 
p(x, y, T) over TH with free(y) C {x,y,7}. As will become 
clear soon, x will be interpreted as the node at which an 
automaton is running locally, and y will be interpreted as 
the head of the automaton’s queue. The queue will contain 
incoming messages sent by automata running at other nodes. 


A rule is an expression of the form 
if y then action 


where y is a guard, and action is an update action given 
by an expression of the form (r; := t), or a send action 
given by an expression of the form send(t). Here, t is a term 
in {z,y,r1,---,7e,0,1}, and ¢ is a finite (possibly empty) 
sequence of such terms. 


A program is now simply a finite set of rules. 


Example 1. Suppose that Y contains a unary relation sym- 
bol Interesting. Below, we display a program which employs 
a register r to distinguish between the first computation 
step and all subsequent steps. In the first step, the program 
checks whether it is running at an ‘interesting’ node. If so, 
it sends a message containing this node. In all subsequent 
steps, it basically forwards the contents of its queue: in each 
step, the first item in the queue is sent in a message of length 
1, after which the item is removed from the queue. In the 
program, “this_node” and “head_q’ stand for the variables x 
and y, respectively. 


if (r = 0) A Interesting(this_node) then send(this_node) 


if (r=0) then r:=1 
if (r = 1) A (head-q # L) then send(head_q) 
if (r = 1) A (head_q= L) then send() 


Definition 1. A Web automaton A is a triple (Y,r,II) 
consisting of a vocabulary Y, a tuple 7 of (register) vari- 
ables, and a program II over Yt and 7. 


Next, we define a notion of run which reflects the behavior 
of a Web automaton when observed at a particular node. 


Local Runs. Let A = (Y,7,II) be an automaton, let Z an 
instance over T, and let s be a (source) node in Z. Expand 
T to a structure Zt over Y* by adding three new elements 
0, 1, and L, and by interpreting the unary relation symbol 
Source as the singleton set {s}. In the following, the words 
queue and message both refer to a finite sequence of bits 
and nodes (in Z). If q is a queue, then head(q) denotes the 


first element of q; the sequence of the remaining elements is 
denoted by tail(q). The head of the empty queue is defined 
to be L. 


Consider a node n in Z. A configuration of A at n is a triple 
(n,q,@) where q is a queue and @ is a tuple (a1,...,a¢) of 
bits and nodes (where we assume that £ is the number of 
registers of A). Intuitively, a; is the content of register r; in 
this particular configuration. 


Consider a configuration (n,q,@). A program rule in II with 
guard y(a,y,7) is said to be enabled in (n,q,a) if TP & 
y(n, head(q), a]. 


The successor configuration of (n,q,@) is the configuration 
(n,q',@’) where q’ = tail(q) and for each i € {1,..., €} the 
following condition is satisfied: if there is precisely one r;- 
update rule in II which is enabled in (n, q,@), and (r; := t) is 
the right-hand side of this rule, then a; = t[z/n, y/head*(q), 
F/a]; otherwise, a, = a;. Here, head* maps the empty queue 
to 0, but is otherwise defined as head. 


We say that A sends a message m in (n,q,G@) if there is 
precisely one send rule in II which is enabled in (n,q, @) 
and, if send(t) is the right-hand side of this rule, m = t[x/n, 
y/head* (q), 7/4]. 


A local run of A at node n (in T with source s) is a finite 
or infinite sequence (C;)iex of configurations of A at n such 
that for every i +1 €x% 


e Ci+ı is a successor configuration of C;, and 


e if A sends the empty message in Ci, then Ci+1 is the 
last configuration of (Ci)iex. 


Remark 1. The reader may wonder why our automata 
need to be able to send messages of length longer than 1. 
After all, an automaton could send a message component- 
wise, i.e., bit by bit, node by node, as messages of length 
1? However, in the next section, we will consider systems 
of communicating automata where a receiving automaton 
may obtain messages from many different automata, and 
these messages can be intermingled during communication. 
In particular, the order in which messages occur in the queue 
of the receiving automaton can be arbitrary. If an automa- 
ton sends a message component-wise, the receiving automa- 
ton may not be able to reconstruct the original message from 
its components. In this context, note that messages longer 
than 1 can always be flanked by separators (e.g., special bit 
sequences), enabling a receiving automaton to distinguish 
between different messages. 


Notice that a local run can start in any configuration. This is 
because in a distributed scenario an automaton may receive 
some messages even before it starts its own, local computa- 
tion. 


We are particularly interested in automata where the time 
between two send actions is bounded by a constant. 


Definition 2. Let k > 1 be a natural number. A Web 
automaton A is k-productive if in every local run (Ci)ick 
of A, there is at least one configuration in which A sends a 
message. A is productive if it is k-productive for some k. 


As an example, recall the program in Example 1, and verify 
that the automaton defined by this program is 2-productive. 


THEOREM 1. Deciding productivity is PSPACE-complete. 


PROOF SKETCH. For containment in PSPACE, consider an 
arbitrary Web automaton A, and suppose that A has (at 
most) £ registers. It is easily verified that A is productive iff 
A is 3°-productive. Furthermore, there is a straightforward, 
non-deterministic algorithm which accepts a given A iff there 
exists a run of A of length 3° — 1 during which A does not 
send a message. This algorithm runs in space polynomial in 
the size of A. Since NPSPACE = PSPACE, we conclude that 
deciding non-productivity is in PSPACE. This immediately 
implies that deciding productivity is in PSPACE as well. 


Hardness for PSPACE is proved via a reduction from a re- 
striction of FIN-SAT(E+TC), the finite satisfiability problem 
for existential transitive-closure logic (see, e.g., [4, 17]). A 
formula of the form [TCz,zy|(E,t’) is called simple if t = 0, 
t = 1, and yg is a quantifier-free formula over the vocabu- 
lary {0,1,=} of the form y’ Az’ € {0,1}. The problem of 
deciding whether a given simple TC formula has a (finite) 
model is PSPACE-complete [17]. We reduce this problem to 
the problem of deciding non-productivity. 


Consider a simple TC sentence Y = [TC ,z/y](0,1), and 
suppose that z (and thus 7’) consists of k variables. One 
can define a Web automaton Ay which is not productive iff 
w is satisfiable. The idea is to let A, interpret its queue 
as an encoding of a y-path from 0 to I. In k consecutive 
steps, Ay reads k bits from its queue, stores the bits in 
registers 7’, and then checks whether y(z, z’) holds. If the 
test is successful, it sets Z = 7’; otherwise, it discards 7’. 
After 2” repetitions, it knows whether an initial segment of 
its queue encodes a path model of p, or not. If it finds a 
model, it does not send any messages; otherwise, it sends 
some dummy message. [O 


4. DISTRIBUTED COMPUTATIONS 


Before we define distributed runs of systems of Web au- 
tomata formally, we provide some intuition. A productive 
automaton A, when started at some source node s in an in- 
stance Z, begins by distributing copies of itself to all other 
nodes, using a straightforward recursive procedure: upon 
creation at a node n, A equips every node which n links to 
with a copy of itself, except if the node is already equipped 
with a copy. This procedure traces out some spanning tree 
of the link graph of Z. Of course, in the ‘real’ Web, it is 
virtually impossible to equip all nodes with copies of A. In- 
stead, we propose to visit nodes only up to a certain level 
in the spanning tree. An upper bound on the levels could, 
e.g., be specified by the user. Also, all automata running 
at nodes which are located on the same server may still be 
implemented by a single process running at that server. 


Once the spanning tree is set up, all automata start running 
concurrently. Each automaton sends its messages to the au- 
tomaton which created it, following a simple protocol based 
on two principles: 


1. Start computing the next message only if you have 
‘enough’ input (see below). 


2. Stop once you have sent a message. 


This naturally organizes a distributed computation in rounds, 
where in each round, every automaton (which is still active) 
sends precisely one message. For instance, automata at leaf 
nodes (of the spanning tree) never receive any messages and 
can thus start a round by computing their own messages 
(in parallel). Each leaf automaton, when finished, sends its 
message to its parent automaton, and then waits for the 
next round. Automata at inner nodes, on the other hand, 
consume messages from their queues each time they move. 
Hence, an inner automaton must wait until it has received 
enough messages (or knows that no new message will arrive 
in this round), such that it can run long enough in order 
to compute its own message. When finished, it also sends 
its message to its parent automaton, and then waits for the 
next round. 


Since our automaton program is productive, every round 
can be performed in parallel time linear in the depth of the 
spanning tree. In every new round, an automaton contin- 
ues its local run where it has stopped during the previous 
round. If an automaton sends the empty message, it exits 
the computation and will not participate in later rounds. If 
the source automaton exits, the whole computation termi- 
nates. The output produced during a computation is the set 
of nodes sent by the source automaton. 


We proceed to the formal definition of distributed runs. Let 
T be an instance with node set N and link set L, and let s 
be a (source) node in Z. We assume that Reach(Z,s) = T; 
if this is not the case, replace Z with Reach(Z,s) in what 
follows. Let 7 be a spanning tree of the link graph (N, L) 
such that s is the root of T. 


A global configuration of A is a mapping y that assigns to 
each node n a configuration of A at n. The initial global 
configuration maps each n to (n, Ø, 0). 


Consider two global configurations y and y’, and let n be a 
node. y is called a successor configuration of y via a move 
at n if the following three conditions hold: 


1. There exists a finite local run (Co,...,C) of A at n 
such that (i) Co = y(n), (ii) r > 1, (iii) no message 
is sent during this local run before C;—1, (vi) A does 
send some message m in C,_1, and (v) Cr = y'(n). 


2. Ifn # s, let p be the parent of n in T and suppose that 
1(p) = (p, 4,4). Then, 7'(p) = (p, qm, a). (That is, m 
is appended to the queue of the parent automaton.) 


3. For every node o different from n and p (if p exists), 
7'(0) = (0). 


Let d be the depth of T. For every i € {0,...,d}, let level(2) 
denote the set of nodes whose distance from s in T is 7. Let 


M be a subset of N containing s. An M-round (along T) 
is a finite sequence (7: )i<kz of global configurations such that 


e for every i+ 1 < k, yi41 is a successor configuration of 
yi via a move at some node ni+1 


e the sequence nı... ng is an enumeration of M, and 


e for each i € {0,...,d} there exists an enumeration ē; 
of level(i) N M such that ĉa... ĉo = ni... Nk. 


The output produced during this round is the set of nodes 
occurring in the message sent by A at s. 


A one-round run p (on T with source s) is an N-round 
which starts with the initial global configuration. Finally, 
a multiple-round run p* (on T with source s) is a finite or 
infinite sequence (pi)iex of rounds along the same spanning 
tree such that po is a one-round run and for every i+1€K 


e pi+i starts with the last configuration of pi, and 


e if p; is an M-round and Mo C M is the set of nodes at 
which A has sent the empty message during pi, then 
pi+ı is an (M — Mo)-round. 


Since M-rounds are only defined when s € M, p* is finite iff 
the root automaton sends the empty message during some 
round p;, in which case p; is the last round of p*. The output 
produced during p* is the union of the outputs produced 
during the rounds of p*. 


Notice that, since A was assumed to be productive, condi- 
tion (1) in the above definition of a successor configuration 
is always satisfied. As a consequence, for every choice of T, 
s, and T, there exists a multiple-round run of A on (Z, s) 
along T. 


Remark 2. It is worth noticing that productivity of A is 
only a sufficient criterion for the existence of a multiple- 
round run of A on any (Z,s). In fact, there are many au- 
tomata which are not productive in the sense of Definition 
2 but which nevertheless always satisfy condition (1), and 
thus always have a multiple-round run. Unfortunately, it 
is undecidable whether a given automaton satisfies condi- 
tion (1) in every possible distributed scenario. However, we 
only mention here that Definition 2 can be relaxed so that 
the obtained notion of productivity becomes strictly weaker, 
still enables multiple-round runs, and is also decidable in 
PSPACE. 


In the remainder of the paper, we focus on productive Web 
automata. 


Example 2. Recall the Web automaton defined in Exam- 
ple 1. When run on a pair (Z, s), this automaton outputs all 
‘interesting’ nodes in Z reachable from s. In each round, the 
source automaton outputs precisely one node. Notice that 
the source automaton does not output any node twice and 
that the output order depends on the choice of the span- 
ning tree and on the order in which the various automata 
communicate during each round. 


Remark 8. Distributed runs as defined in this section are 
in fact linearizations of partially-ordered runs in the spirit 
of Gurevich [9]. This explains why the intuitive description 
of distributed computations at the beginning of this section 
may not entirely conform to the formal definition of dis- 
tributed runs: the intuitive description refers to (genuine) 
partially-ordered runs. It is possible to give an alternative 
definition of distributed runs which does not make use of 
linearizations and which can easily be implemented. 


5. AUTOMATA COMPUTING QUERIES 


In general, a Web automaton can exercise many different 
distributed runs on one and the same Web instance (and 
source node). This is because both the choice of the span- 
ning tree and the order of communications during rounds 
can be arbitrary. As a result, different runs may produce 
different output sets. 


Example 3. Consider the following program. For the sake 
of readability, we display the program in a slightly relaxed 
syntax, using nested if-then-else rules (with the obvious 
meaning). 


if (r=0) then 
r:=1 
if (head_qg# L) then 
send(head-q) 
else 
send(this_node) 
else 


send() 


On the 3-node instance nı — s — ng, the output can be 
either {ni} or {n2}, depending which of the two children of 
s gets its message first in the queue of s. By adding the two 
links nı —> n2 and n2 — nı, we obtain also dependence on 
the choice of the spanning tree. If the tree s — nı — ng is 
selected, the output is {n2}, while if the tree s > ng > nı 
is selected, the output is {nz}. 


On the other hand, there are automata whose output is 
independent of the choice of the spanning tree and the order 
of communications. To see an example, consider again the 
automaton in Example 1. Here, the output set is always the 
same, although the output order can differ from run to run. 
This motivates the following definition. 


Definition 3. A Web automaton A is sound if for every 
pair (Z, s), every multiple-round run of A on (Z, s) produces 
the same output. In that case, we can speak of the Web 
query computed by A, which maps a pair (T, s) to the output 
(produced during any run) of A on (Z, s). 


Unfortunately, we cannot decide whether a given Web au- 
tomaton is sound. (Nevertheless, there exists an interesting 
class of Web automata for which soundness is decidable, as 
we will see in the next section.) 


THEOREM 2. Soundness is undecidable. 


PROOF SKETCH. The proof is by reduction from the emp- 
tiness problem for deterministic one-way two-head automata 
(2-DFAs). It suffices to consider simple 2-DFAs, i.e., 2- 
DFAs whose input alphabet is {0,1} and whose program 
ensures that every computation progresses in two distin- 
guished phases. During the first phase, a simple 2-DFA M 
uses its first input head to scan an initial segment of the 
input tape. The second input head remains idle. After each 
computation step, M may or may not switch to the sec- 
ond phase, depending on its current configuration. If and 
when M switches to the second phase, the first input head is 
placed somewhere on the tape, while the second input head 
is still on the first tape cell. During the second phase, M 
can do whatever 2-DFAs are entitled to do, with the restric- 
tion that, in every computation step, M must move both 
input heads, each one to the next tape cell. A computa- 
tion of M stops if the input is accepted or if the first input 
head reaches the end of the input tape. One can show that 
for simple 2-DFAs the emptiness problem is undecidable (by 
reduction from the word problem for Turing machines). 


Let M be a simple 2-DFA. Recall that by Qirue we denote 
the Web query which maps a pair (Z,s) to the set of those 
nodes in Z which are reachable from s. One can construct a 
Web automaton Am over {Link} such that 


e if L(M) = Ø, then Am computes Qirue, and 
e if Am is sound, then L(M) = Ø. 


This reduces the emptiness problem for simple 2-DFAs to 
the problem of deciding soundness. Some details of the con- 
struction follow. In the first round, Am performs the fol- 
lowing two tasks in parallel. First, it checks whether it is 
executed along a spanning tree which has the form of a path. 
Second, it pretends that the first test was successful, views 
the spanning tree (which is now assumed to be a path) as an 
input tape (where link self loops represent set input bits), 
and simulates the first phase of M on that input tape. If 
the first test fails, the source instance of Am switches to a 
‘forwarding’ mode, which means that in every subsequent 
round it just outputs all nodes (reachable from the source 
node). The same happens if during the simulation of M the 
first input head reaches the end of the (virtual) input tape. 


If the source automaton survives the first round without 
switching to forwarding mode, then, in all subsequent rounds, 
Am simulates the second phase of M and, in parallel, out- 
puts all nodes. Except if the source automaton discov- 
ers during the simulation that M accepts. In that case, 
the source automaton switches to a ‘spoiling’ mode, which 
means that it stops outputting nodes and instead sends some 
dummy messages. 


An indication of the querying power of Web automata is 
provided by the next result. We call a Web instance Z lo- 
cally ordered if it contains a distinguished ternary relation 
<, typically written x <, y, such that for every node n in T, 
the binary relation x <n y is a total order on the children 
ofn in TZ. 


THEOREM 3. Any logarithmic-space computable Web que- 
ry on locally ordered instances is computable by a Web au- 
tomaton. 


PROOF SKETCH. Let y(21,...,@%) be a formula of de- 
terministic transitive-closure logic (see, e.g., [4]). One can 
construct a Web automaton A, which, on every pair (Z, s) 
with a locally ordered Z, enumerates {@ : Reach(Z,s) — 
y|a]} in the following sense. In every round, A, at s sends 
either a ‘wait’ message or a message (a1,...,@,) satisfying 
y. Eventually, all messages satisfying y are sent by A, at 
s. The theorem is then implied by a well-known result due 
to Immerman [10], namely that a query on finite ordered 
structures is logarithmic-space computable iff it is express- 
ible in deterministic transitive-closure logic (see also [4]). 
The construction of Ay is based on the following observa- 
tion. There exists a Web automaton Aenum such that every 
multiple-round run (p;); of A on (T, s) satisfies the following 
three conditions: 


1. (pi): is infinite. 
2. During each p;, there is at most one node output. 


3. Let (nj); be the node sequence produced during (p;); 
where rounds with empty output are omitted. There 
exists an enumeration € of the reachable nodes such 
that (n;); can be seen as an infinite repetition of é. 


Ag can now be defined by induction on y. For instance, 
if p(z) = R(Z), then A, simulates Acnum, turns the repet- 
itive enumeration of all nodes into an enumeration of all 
k-tuples of nodes, and checks whether R(Z) holds for each 
k-tuple. 


An undesirable behavior of Web automata, even of sound 
ones, is that one may have to wait many rounds before seeing 
any new output (e.g., a node which has not been output yet). 
In the worst case, one may even wait only to learn later that 
there is no new output at all. Since each round takes only 
linear parallel time in the depth of the portion of the Web 
which we are exploring, it would be particularly interesting 
to have the following behavior. 


Definition 4. A Web automaton A is called continuous if 
every multiple-round run (pi); of A satisfies the following 
condition: during each round p;, except of the last round, 
there is at least one node output which has not been output 
during any round preceding pi. 


THEOREM 4. Continuity is undecidable. 


PROOF SKETCH. The proof is similar to the proof of The- 
orem 2. In fact, Am can be constructed so that (i) if L(M) = 
Ø, then Am continuously computes Qtrue, and (ii) if Am is 
continuous, then L(M) = Ø. This reduces the emptiness 
problem for simple 2-DFAs to the problem of deciding con- 
tinuity. 


Remark 4. Theorems 2 and 4 hold already for automata 
which test only one unary relation (in particular, which do 
not test the link relation). Both theorems remain true if 
we focus on automata which test only the link relation (and 
no other relation). Moreover, undecidability is encountered 


even if we restrict our attention to tree-like Web instances 
(which have a unique spanning tree). Finally, both theorems 
remain true for finite Web automata, i.e., Web automata 
which cannot store nodes in their registers and therefore 
have only a finite number of different internal states. Note 
that the automaton in Example 1 is finite in that sense. 


The next result provides a class of Web queries computable 
by continuous Web automata, in terms of a fragment of 
first-order logic, which we call at-most-at-least logic. The 
fragment may seem artificial at first, but later we will see 
that it is associated to a natural subclass of the class of Web 
automata (see Theorem 8). 


Let a(x) be quantifier-free formula with free(a) = {x}, and 
let k be a natural number. Subsequently, we write (3x € a) 
instead of dra(x). An a-at-most formula is a formula of the 
form (|a| < k) A ya(s, £) where 


e (|a| < k) abbreviates the first-order formula =(47*x € 
a), and 


e Ya(s,x) is a boolean combination of formulas of the 
form (Ay1 € a)... (Gyr € a)B(s, x, Y) with 8 quantifier- 
free and r > 0. 


An at-most-at-least formula is a formula of the form a(x) A 
a(s, £) where ôa (s, x) is a boolean combination of a-at-most 
formulas and atomic formulas ((s). 


Example 4. Suppose that the vocabulary Y contains two 
unary relation symbols Red and Green. Verify that the fol- 
lowing formula is a Red-at-most formula over Y: 


|Red| < 42 A 7(Green(x) A (Ay € Red) Link(x,y)). 


The negation of this formula is equivalent to |Red| < 42 — 
7 YRea(x) where YRea(x) stands for the second conjunct of the 
above formula. The following formula is now an example of 
an at-most-at-least formula. Intuitively, the formula says 
that, if there are more than 42 red nodes, then output all 
red nodes; otherwise, output all red nodes which are also 
green and which link to another red node. 


Red(x) A (|Red| < 42 > ~ Yreal£)). 


THEOREM 5. Any Web query definable in at-most-at-least 
logic is computable by a continuous Web automaton. 


The proof of this theorem is presented in a more general 
context in the appendix (see also Remark 8). The converse 
direction of the theorem does not hold, however. For in- 
stance, the query defined by the following formula is contin- 
uously computable, but the formula is not equivalent to any 
at-most-at-least formula: 


(Red(x) A |Red| > 42) V (Green(x) A |Green| > 10). 


Remark 5. There is an interesting variant of Theorem 5, 
which reads as follows. A generalized at-most-at-least for- 
mula is simply a boolean combination of at-most formulas 


and quantifier-free formulas 8(s,x). In particular, the at- 
most subformulas of a generalized at-most-at-least formula 
do not need to be defined w.r.t. the same a. One can show 
that any Web query definable by a generalized at-most-at- 
least formula is computable by a Web automaton with dis- 
card action, i.e., a Web automaton which can discard its 
queue, in addition to performing update and send actions. 


The next observation gives an example of a query which is 
computable by a Web automaton, but not by a continuous 
one. 


PROPOSITION 1. Let R be a binary relation symbol. The 
Web query Qr : (Z,s) + {n: Reach(Z,s) | R(s,n)} is not 
computable by a continuous Web automaton. 


Indeed, to be continuous, the source automaton must start 
outputting already in the first round. However, by produc- 
tivity, it can see only a constant number of nodes in each 
round. If the communication order is unfortunate, none of 
the nodes seen in the first round qualify for output. 


Remark 6. A similar argument shows that the Web query 
defined by the at-most-at-least formula Red(x) A |Red| < 2, 
while computable by a continuous Web automaton, is not 
computable by a continuous Web automaton that can send 
messages of length at most 1 (recall Remark 1). 


6. DECIDE AND FORWARD AUTOMATA 
In this section, we introduce a natural, syntactic subclass of 
the class of Web automata, for which it is decidable whether 
a given automaton is sound and continuous. The subclass 
is closely tied to the at-most-at-least logic defined in the 
previous section. 


A Web automaton is called link-free if the link relation does 
not occur in its program. Link-freeness of sound automata 
simplifies things considerably, as exemplified by the follow- 
ing ‘flat-tree’ property. For every pair (Z,s), let flat(Z, s) 
denote the Web instance obtained from Z by changing the 
link graph of Z into a flat tree with root s, i.e., a tree where 
all nodes, except of s, are children of s. 


PROPOSITION 2. Let A be a sound, link-free Web automa- 
ton, and let Q be the Web query computed by A. For every 
pair (T, s), Q(T, s) = Q(flat(Z, s), s). 


ProorF. Consider a pair (Z,s). Let Z’ be obtained from 
T by adding links from s to all other nodes. Every run of 
A on (Z,8) is also a run of A on (Z’,s), and by soundness, 
Q(Z, s) = Q(Z’,s). Likewise, every run of A on (flat (Z, s), s) 
is also a run of A on (Z’,s), hence Q(flat(Z, s),s) = Q(Z’, 
s) = Q(Z, 6). 


One can easily show that, on flat trees, every sound Web au- 
tomaton computes a logarithmic-space computable query. 
The above proposition then implies that any Web query 


computable by a link-free Web automaton is logarithmic- 
space computable. Whether this also holds in general is an 
open problem; the proposition certainly does not hold in 
general. 


Remark 7. Referring back to the introduction, we point 
out (only half seriously though) that Proposition 2 provides 
some kind of a-posteriori justification of the way Internet 
supercomputing works, where in fact the standard mode of 
operation is so that all computers which participate in a 
distributed computation report directly to a central source 
computer. 


A decide-and-forward automaton (or DF automaton for short) 
is a Web automaton whose program has the form 


if first-round then 
Il 

else 
Tforward 


where first_round is a boolean register initialized with true, 
II is a program in which every send rule has the form 


if y then send(t); first_round := false 


where y is a guard of the form y’ At ¢ {0,1}, and Hyorward 
is the program 


if (head_qg# L) then 
send(head_gq) 
else 


send() 


In other words, a DF automaton makes all the crucial deci- 
sions in the first round, sends only nodes (no bits), and acts 
in all subsequent rounds (if any) merely as a forwarder that 
flushes the remaining contents of the communication queues 
to the output. 


Note that the program displayed in Example 1 can easily be 
rewritten into an equivalent DF form (see also the explana- 
tion in Example 2). 


A Web automaton is called monadic if in its program no 
relation of arity > 2 occurs. In particular, a monadic au- 
tomaton is link-free. For monadic DF automata, we obtain 
the following two decidability results plus a characterization 
in terms of monadic at-most-at-least logic. The correspond- 
ing proofs are rather technical and sketched in the appendix. 


THEOREM 6. For monadic DF automata, emptiness is 
decidable. (In general, emptiness is undecidable.) 


THEOREM 7. The problem of deciding whether a monadic 
DF automaton is sound and continuous is decidable. 


THEOREM 8. A Web query is computable by a continuous 
monadic DF automaton iff it is definable in monadic at- 
most-at-least logic. 


Recall the query which shows that the converse direction of 
Theorem 5 does not hold in general (see after Theorem 5). 
Since this query is monadic, we obtain the following corol- 
lary of Theorem 8. 


COROLLARY 1. Continuous DF automata are strictly weak- 
er than (general) continuous Web automata. 


7. BROWSER STACK MACHINES 


For the work by Abiteboul and Vianu [2] was one of the main 
inspiration for the present paper, we conclude our investi- 
gation by drawing a connection between Web automata and 
Abiteboul and Vianu’s browser machines. More precisely, 
we introduce browser stack machines, a restricted variant of 
browser machines, and show that browser stack machines 
can be simulated by Web automata in depth-bounded re- 
gions of the Web. 


Browser Stack Machines. There are two main restric- 
tions which we impose on browser machines: 


e the work tape is replaced with a finite number of reg- 
isters, and 


e the browsing tape is organized like a stack, forcing a 
machine to explore the Web only by means of the three 
familiar surf actions of common Web browsers: ‘follow 
this link’, ‘go back’, and ‘go forward’. 


Formally, browser stack machines (BSMs for short) are de- 
fined as follows. Let Y and 7 be as in the definition of Web 
automata. A guard (of a rule) is a quantifier-free formula 
p(x,7) over TU {0,1} with free(y) C {x,r}. (This time, x 
will denote the stack element which the cursor of a BSM is 
currently pointing to.) A BSM program II is a finite set of 
rules of the form (if p then action) where ọ is a guard, 
and action is an expression of the form up, down, expand, 
(ri := t) or output(t). Here, t is a term in {x,7,0, 1}. 


Definition 5. A browser stack machine M is a triple (T, 
r, II) consisting of a vocabulary T, a tuple 7 of (register) 
variables, and a BSM program II over Y U {0,1} and 7. 


Runs. Let M = (Y,7,I) be a BSM, let Z be a locally 
ordered instance over Y (recall our convention prior to The- 
orem 3), and let s be a (source) node in Z. Subsequently, the 
word stack refers to a finite sequence of nodes and 0’s. An 
occurrence of 0 on a stack will serve as a separator between 
different segments of the stack. If st is a stack of length k, 
then by a cursor on st we mean a natural number between 
1 and k. 


A configuration of M is a quadruple (st, c, a, O) where st is a 
stack, c is a cursor on st, & is a register assignment (as in the 
case of Web automata), and O is a set of nodes. Intuitively, 
O is the output produced so far. 


Consider a configuration (st,c,a@,O). The successor config- 
uration (st',c’,a@’,O’) of this configuration is defined in the 
obvious way. Here, we only give some details concerning the 
actions up, down, and expand. Suppose that there is pre- 
cisely one stack rule in I which is enabled in (st, c,a,O). If 


the right-hand side of the rule is up or down, then sť = st 
and c’ is obtained from c as usual. If the right-hand side of 
the rule is expand, partition st into stow and sthigh such that 
st = Stow Sthigh and the length of stow is c. If c points to a 
node, say, n, and è denotes the enumeration of all children 
of n in the order x <n y, then sť = sħow 0E and œ = c. 
Otherwise, sť = st and c’ = c. 


A run p of M (in Z with source s) is a finite or infinite 
sequence (C;)ic, of configurations of M such that Co = 


(s,1,0,@) and for every i+1 €x 


e Ci+ı is a successor configuration of Ci, and 


e Ci+ı is the last configuration of p if M attempts to 
move the cursor below the stack bottom in Cj. 


Note that p is uniquely determined by (Z,s). We say that 
M halts on (Z,s) if p is finite. In that case, the output 
component of the final configuration of p is called the output 
of M on (T, s). If M halts on every pair (T, s), we can speak 
of the Web query computed by M, which maps a pair (Z, s) 
to the output of M on (T, s). 


The next theorem gives an indication of the querying power 
of BSMs. 


THEOREM 9. Any logarithmic-space computable Web que- 
ry on locally ordered instances is computable by a BSM. 


The proof of this theorem is similar to the proof of Theorem 
3. The only difficult part is to find a BSM which repeti- 
tively enumerates all nodes (similar to Aenum in the proof of 
Theorem 3). We omit the details. 


Depth-Bounded BSMs. Let d be a natural number. A 
BSM M is called d-bounded if it maintains a counter of the 
number of separators between the stack bottom and the cur- 
rent cursor position. Whenever this counter equals d, M 
ignores all expand actions. 


We can now draw a connection between Web automata and 
browser machines. 


THEOREM 10. Any Web query computable by a depth- 
bounded BSM is computable by a Web automaton. 


PROOF SKETCH. Due to Theorem 3, it suffices to show 
that any depth-bounded BSM can be simulated by a loga- 
rithmic-space bounded Turing machine (with separate input 
and output tapes). Consider a d-bounded BSM M. We 
describe a logarithmic-space bounded Turing machine Tm 
such that for every input (Z, s), Tm on an encoding of (Z, s) 
simulates M on (Z,s). 


First observe that, because Tm has a separate output tape, 
it does not need to store the output of M. If (st,c,a,O) 
is a configuration of M (on some fixed input (Z,s)), then 
the triple (st,c,@) is called a reduced configuration of M. 
We show that Tm can store a representation of any reduced 
configuration in logarithmic space (in the size of Z). This 


clearly holds for the contents a of the registers of M. Con- 
sider the stack st. By the i-th segment of st we mean the 
segment which 


e starts with the node following the (i —1)-th separator, 
and 


e ends with the i-th separator. 


For example, the first segment of any stack consists of the 
source node and the first separator. To represent st, Tm em- 
ploys d registers, called stack registers. Each stack register 
either holds a node or is undefined, and thus requires only 
logarithmic space. During a computation, the i-th stack reg- 
ister holds the last node of the i-th segment of st (i.e., the 
node before the i-th separator). This node was expanded 
when the (i + 1)-th segment was placed on the node stack. 


To represent the cursor c, Tm employs another register, 
called cursor register, and a counter ranging in {1,...,d+1}. 
During a computation, the cursor register holds the node 
currently read by the cursor; it is undefined iff the cursor 
is currently placed on a separator. The counter specifies 
in which segment the cursor is currently roaming. Both, 
the counter and the cursor register require only logarithmic 
space. (Verify that, because no node occurs twice in the 
same segment, the counter and the cursor register together 
uniquely determine the position of the cursor.) 


Using this representation of reduced configurations, Tm can 
simulate transitions of M from reduced configurations to 
reduced configurations in logarithmic space. 
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APPENDIX 


Below, we sketch the proofs of Theorems 6 and 7. The proofs 
of Theorems 5 and 8 are subject of the second part of this 
section. 


Decidability Results 

Fix a DF automaton A. We assume that A is k-productive. 
A Web instance Z is called tree-like if the link graph of T is 
a tree (where each leaf is reachable from the root). By a run 
of A on a tree-like Z we mean a run of A on (Z,r) where r 
is the root of Z. 


LEMMA 1. Suppose that A is monadic. There exists a 
(computable) constant ca,nr such that for every one-round 
run p of A there exists a one-round run p' of A on a tree- 
like Web instance of size at most cA k such that the message 
sent by the source automaton during p' is identical with the 
message sent by the source automaton during p. 


Using this observation, one can prove the following theorem. 


THEOREM 11. For monadic Web automata the first-round 
emptiness problem is decidable. 


PROOF OF THEOREM 6. Verify that the identity is a re- 
duction from the emptiness problem for DF automata to the 
first-round emptiness problem for Web automata. Theorem 
6 then follows from the above theorem. 


The main idea in the proof of Theorem 7 is to reduce the 
problem of deciding soundness to the problem of deciding 
soundness on ‘flat’ trees. We call a tree-like Z flat if the link 
depth of Z, measured from the root, is at most 1. Note that, 
for every tree-like Z with root r, flat(Z, r) is by definition flat 
(recall the definition of flat(Z,r) prior to Proposition 2). 


Definition 6. A is flat-tree sound if for every flat Z, every 
multiple-round run of A on Z produces the same output. A 
is flat-invariant if it is flat-tree sound and for every tree-like 
T, every multiple-round run of A on Z produces the same 
output as A on flat(Z,r) where r denotes the root of Z. 


LEMMA 2. Flat-tree soundness is a decidable property of 
DF automata. 


LEMMA 3. Suppose that A is monadic. A is sound iff A 
is flat-invariant. 


The remainder of the construction concerns a procedure for 
deciding flat-invariance. 


Alpha Nodes. Let a(x) be a quantifier-free formula with 
free(a) = {x} such that for every tree-like Z and for every 
leaf node n in Z, T — aln] iff A at n sends a non-empty 
message during a one-round run of A on Z. We call a node 
n in T a-node if TE afn]. 


Alpha-Sending Automata. A is called a-sending if dur- 
ing every one-round run of A, every non-empty message sent 
by A at a non-source node contains pairwise distinct a-nodes 
only. 


LEMMA 4. If A is a-sending, then A is continuous. 


LEMMA 5. It is decidable whether a given Web automaton 
is a-sending. 


Alpha-Outputting Automata. A is called a-outputting 
if for every (Z,s), every multiple-round run of A on (Z, s) 
produces a subset of {s} U Na as output, where Na is the 
set of a-nodes in Z. 


LEMMA 6. If A is monadic and sound, then A is a-out- 
putting. 


LEMMA 7. It is decidable whether a given a-sending DF 
automaton is a-outputting. 


We call a tree-like Z sparse if there are at most k non-root 
a-nodes in Z. 


Definition 7. A is sparse-tree sound if for every sparse T, 
every multiple-round run of A on Z produces the same out- 
put. 


LEMMA 8. Sparse-tree soundness is a decidable property 
of a-sending DF automata. 


The next lemma is central to the construction. Its proof is 
based on Lemma 1. 


LEMMA 9. Flat-invariance is a decidable property of flat- 
and sparse-tree sound, a-sending and -outputting, monadic 
DF automata. 


We are now in the position to sketch the proof of our main 
decidability result. 


PROOF OF THEOREM 7. Consider a monadic DF automa- 
ton A. We call A bounded if for every flat Z with precisely k 
non-root a-nodes, and for every one-round run p of A on T, 
p is terminating (i.e., the source automaton sends the empty 
message during p). Otherwise, we call A unbounded. 


First determine whether A is bounded or unbounded (sim- 
ply by testing all non-isomorphic small flat trees). Suppose 
that A is unbounded. One can show that, if A is sound and 
continuous, then A must be a-sending. Check whether A 
is a-sending (see Corollary 5). If the test fails, reject A. 
Otherwise, check whether A is a-outputting (see Lemma 7). 
If this test fails, reject A (because A is not sound accord- 
ing to Lemma 6). Otherwise, check whether A is flat- and 
sparse-tree sound (see Lemmata 2 and 8). If one of the two 
tests fails, reject A (clearly, A cannot be sound in that case). 
Otherwise, check whether A is flat-invariant (see Lemma 9). 
If this test fails, reject A (because A is not sound accord- 
ing to Lemma 3). Otherwise, accept A, for it is sound and 
continuous due to Lemmata 3 and 4. 


Now suppose that A is bounded. Note that A may not 
be a-sending in this case. An analysis similar to the one 
outlined above leads to a decision procedure for bounded 
automata. 


Expressibility Results 

This second part of the appendix concerns the proofs of 
Theorems 5 and 8. In the following, Aa(s, £) denotes an a- 
at-most literal, i.e., a formula of the form |a| < k A ya(s, £) 
or the form |a| < k > ya(s, £). 


PROPOSITION 3. Any conjunction of a-at-most literals (in 
the variables s and x) is equivalent to an a-at-most literal. 
The same holds true for disjunctions of a-at-most literals. 


LEMMA 10. Let y(s,x) be a formula of the form a(x) A 
Aa(s,x). The Web query defined by p, Qo, is computable 
by a continuous Web automaton. 


PROOF SKETCH. Suppose that Aa is a positive literal, say, 
Aa = |a| < kA Ya(s,x). We describe briefly a continuous 
automaton A, which computes Qy. Ay is (k+1)-productive 
and sends messages of length < k + 1. If A, is running at 
a node different from the source node, it forwards in each 
round as many as possible (but at most k + 1) nodes sat- 
isfying a to its parent automaton. If Ay is running at the 
source node, it attempts to see (k + 1) nodes satisfying a. If 
it succeeds, it sends the empty message, thereby terminating 
the computation. Otherwise, it knows all (reachable) nodes 
satisfying a. In particular, there are at most k such nodes. 
For each such node n, the source automaton checks whether 
Ya(s,7) holds and, if successful, outputs n. 


Now suppose that Aq is a negative literal, say, Ax = |a| < 
k — a(s,x). Modify Ay as described above so that, if 
the source automaton discovers that there are at least (k + 
1) nodes satisfying a, then, instead of sending the empty 
message, it outputs all nodes in its queue, plus the source 
node if the source node satisfies a. 


Color Types. Let x be a variable. A color type in x is a 
maximal consistent set of atomic and negated atomic for- 
mulas in x. Observe that every quantifier-free formula a(x) 
with free(a) = {x} is equivalent to a disjunction of color 
types in x. 


PROOF OF THEOREM 5. Let ọ(s, x) be an at-most-at-least 
formula. We construct a continuous automaton Ay which 
computes Qy. Suppose that y(s,x) = a(x) A da(s,x). Us- 
ing Proposition 3, one can show that ôa is equivalent to a 
formula of the form 


V; (cils) A Xa,i(8,2)) (1) 


where each c;(s) is a color type in s such that c; = c; iffi = j. 
According to Lemma 10, for each index i in formula (1), 
there exists a continuous automaton computing Qa, Nee It 
is now an easy exercise to combine these automata to a 
continuous automaton A, computing Qo. 


Remark 8. The proofs of both Lemma 10 and Theorem 5 
can be arranged so that the constructed automata are link- 
free and DF. 


PROOF OF THEOREM 8. Let Q be a Web query. Suppose 
that Q is definable by a monadic at-most-at-least formula. 
According to Theorem 5, Q is computable by a continuous 
Web automaton. By Remark 8, this automaton is monadic 
and DF. 


Now suppose that Q is computable by a continuous monadic 
DF automaton A. Furthermore, suppose that A is (k + 1)- 
productive. Let s and x be two variables, and let c1(s), 
.., ce(s) be an enumeration of all color types in s (over 
the vocabulary of A, and up to isomorphism). Clearly, 


V; ci(s) = (s = s). We are going to construct a quantifier- 
free formula a(x), and for each i € {1,...,@}, an a-at-most 
literal Aa ¿(s, x) such that the formula 


a(x) A V; (ci(s) A Ao,a(s,2)) 
defines Q. 


Let a(x) be define as in the previous subsection (see below 
Lemma 3). Intuitively, a specifies those (colorings of) leaf 
nodes which the source automaton can possibly see during 
any computation (recall Lemma 3). 


The definition of Aq,i(s, x) is based on various tests revealing 
the behavior of A when executed at c;-colored source nodes. 
Choose pairwise distinct color types ci(x),...,¢m(x) from 
the set {ci(x),...,ce(x)} so that a(x) = V; c;(x). Let T be 
a flat instance such that 


e the root node r of Z satisfies c;(s), and 


e for each j € {1,...,m}, there are at least (k + 1) leaf 
nodes satisfying c;(z). 


We are going to execute A at r (in Z) on various queues 
consisting of a-nodes. 


By a coordinate k we mean a tuple (k1,...,km) such that 
ki,...,km <(k+1). Let k be a coordinate. A k-queue is a 
sequence of leaf nodes in Z such that 


e the length of the sequence is ae kj, and 


e for each j € {1,...,m}, the sequence contains precisely 
kj pairwise distinct nodes satisfying c;(z). 


Let q be a k-queue. We say that A at r accepts q if the first 
message sent by A at r on q is not empty (i.e., contains a 
node). 


Verify that for any two k-queues q and q’, A at r accepts q 
iff A at r accepts q’. Hence, we can define an m-dimensional 
table T; as follows: at coordinate k, T; contains “accept” if A 
at r accepts any k-queue; otherwise it contains “reject”. By 
D; we denote the diagonal plane of T given by all coordinates 
satisfying )`;—ı kj = (k +1). One can show that D; has 
either only accept entries or only reject entries. 


Next, observe that the definition of T; does not depend on 
the choice of Z. We obtain the same table for any flat Z 
whose root node satisfies c;(s), and which contains enough 
leaf nodes satisfying c/(x) (for each j). This shows that the 
decision of whether A at a c;-colored source node is going to 
output or not is entirely determined by the entries on and 
below the diagonal plane Dj, i.e., all entries at coordinates 
satisfying )), kj < (k +1). 


Suppose that D; has only reject entries. Let S be the set of 
those coordinates which satisfy }),k; < k and where T; has 


an accept entry. Define Aq,i(s, x) to be 


(lal < k) A Vres (12 AE (8,2) (2) 
where y; and 7; are as follows. If c;(x) does not occur 
among ¢}(x),-..,Cm(x), set yp = A, (|cj| = kj). Otherwise, 


suppose that c;(x) = cp(x), and set y; = (|cp| = kı + 1) A 
Aj zp(Iejl = kj). By testing A, one can define 7; so that 


it specifies (i) those (colorings of) nodes in a k-queue which 
are output, and (ii) whether or not s is output. 


Now suppose that D; has only accept entries. In that case, 
replace the first conjunction symbol in formula (2) with an 
implication symbol. 


